DOCQHEIT BESOHE 



ED 088 948 



TH 003 513 



&0THOP 
TITLE 
POB DATE 
NOTE 



EDHS PRICE 
DESC6IPT0ES 



Chen, Hartin K. 

Outcoie Measures of Health Prograis: Rhat and Bov? 
Apr 7a 

12p«; Paper presented at the Aierican Educational 
Research Association annual leeting (Chicago, 
Illinois, April 15-19, 1974) 

HP-$0,75 HC-$1,50 

Evaluation; "^Health; ^Health Prograis; 4'H€a&ur€ient; 
"^Hodels; Besearcb Design; Statistical Analysis 



ABSTBACT 

Kith the proliferation of nev health programs, such 
as Health Maintenance Organizations (HHO*s) and Professional Service 
Reviev Organizations (PSBO*s) , the task of evaluating the iipact of 
such prograas on the health delivery systeas and on the health of the 
American people becoies lore urgent. Thus far no ezperinental or 
qoasi-experiaental designs have been found that are both feasible and 
satisfactory. Soae guasi-experiiental designs, particularly 
interrupted tiae series, have been suggested as a possible solution 
to the problei. The strengths and weaknesses of this and ether 
designs, as veil as the statistical problems associated vlth thei, 
are discussed. (Author) 
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OUTCOME MEASURES OF HEALTH PROGRAMS—lfflAT AITO HOW? 

Martin K. Chen 
National Center for Health Services Research 

Rockville, Maryland 

There are two basic problems confronting policymakers in the organization 
and delivery of health services. They are (1) the measurement of health 
in quantitative terms » and (2) the establishment of a causal nexus between 
a health program and the measure of health as an outcome of the program. 
Until these problems are successfully resolved, decision-making In the 
arena of health care must by necessity be made on evidence other than hard 
data on health status. Some health administrators in their decision-making 
use such Indicators as the rate of utilization of different types of services, 
cost of operation* and consimer satisfaction with services » etc. To be sure, 
all these indicators tell something about the quality of the program. In the 
final analysis, however » a health program cannot be said to have fulfilled the 
requirements of society unless indisputable evidence is generated that the 
program has Improved the health status of the connunlty it alms to serve. In 
other words, the most important outcome measure of a health program should 
be health status. 

,1^ But what is health status? Is It mortality rate? Morbidity rate? Comblna- 

^ tlon of mortality and morbidity? Freedom from physical and mental dysfunc- 

tions? Positive feeling of well-being? Predisposition to Illness? Health 

Is, of course, all of this and more. Depending on one's orientation, health 

has been defined In many ways that Involve one or more of these aspects. The 
^ t 

only definition that is designed to be comprehensive and encompasb the total** 
Ity of health as It is generally conceived is that of the World Health 
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Organization (WHO). (1) However » this definition that "health Is a 
state of complete physical » mental and social well-being and not merely 
the absence of disease and Illness" Is considered by the WHO as being 
too Imprecise to lend Itself to objective measurement. (2) 

In spite of the conceptual and definitional difficulties just described » 
oany health Indicators have been developed. Explicitly or implicitly » 
authors of these Indicators have based their developmental work on opera- 
tional definitions of health that differ in varying degrees from one another 
in orientation^ emphasis » and conciseness of terms. To facilitate the study 
of health indicators extant and to be developed » Chen (3) has designed a 
classification model by which the indicators can be properly categorized 
for various purposes » including the establishment of a clearinghouse for 
health indicators. 

SUGGESTED CLASSIFICATION MODEL 
Based in part on the classification scheme of Baumann»(4) this classifi- 
cation model possesses the following characteristics: 

(1) It is flexible enough to cover all of the essential dimensions 
on which the health indicators can be differentiated; 

(2) Tlie dimensions are non-overlapping and independent; that is to 
8ay» the scale of any one dimension is independent of the scales 
of all the other dimensions in the model; 

(3) The model is exhaustive in the sense that all health indicators 
that have been developed and will be developed in the future can 
be fitted into the proper categories of the model; and 

(4) The designation of each category can be uniquely determined for 
easy manual or computer storage and retrieval; in other words » 
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the categories of Indicators generated by this model are 
mutually exclusive. 

To simplify exposition, only three dimensions are used in this model. 
There is nothing sacrosanct about the number "three;" this number is used 
because a three-dimensional model can be represented geometrically for 
visual inspection, whereas anything above three dimensions involves hyper-* 
space that is easily represented by algebra but not by Euclidean geometry. 
Furthermore, the steps along each dimension are flexible, since the dimen- 
sional scales are nominal and have no ordinal value. However, it must be 
remembered that the total number of categories Is the product of the number 
of steps of all the dimensions ana this number can be staggering if the 
number of dimensions and the number of steps along each dimension become too 
large. The law of parsimony demands that the minimum nimber of dimensions 
and steps adequate to do the job be used as the optimum. 

THREE DIMENSIONS OF CLASSIFICATION MODEL 
The three dimensions are utility, measurement and orientation. By utility 
is meant whether an indicator is applicable to an individual or a community 
or nation^ The adoption of this two-category dimension is by design. It 
Is assumed that an indicator that applies to a community Is also applicable 
to a nation. Furthermore, the family or household is left out because any 
indicator that is applicable to an individual can be used with a family 
when it is aggregated in some fashion. These assumptions are used for 
simplification, and are not necessary for the validation of the model. For 
example, one could adopt a finer gradation by creating categories such as» 
the individual; the family; both individual and family; the community; the 
nation; and both community and nation. These six categories fairly exlvaust 
O the possibilities of the utility dimension. 
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The measurement dimension is composed of three steps or categories, 
based on observed data, based on self-reporting data, and based on both 
observed data and self -reporting data. By observed data is meant data 
obtained through observation of the subjects, not through personal or 
questionnaire interview of the subjects* On the other hand, all data 
obtained through the personal or questionnaire interview are considered 
self-reporting data* 

It is recognized that a good interviewer obtains both self -reporting data 
and observational data, but these two types of data are easily differenti- 
ated and it still is possible to dichotomize the data into the two types 
for the purpose of the model. Where they cannot be easily differentiated, 
then the data are both self-reported and observed. 

One important distinction must be made to prevent confusion. Scares on 
Intelligence, aptitude, or achievement tests are observed data, not self^ 
reported data. This is so because the performance of a subject on any of 
these types of tests is evaluated by some objective criterion or criteria. 
For example, if in an arithmetic test a subject gives the answer "four" 
to the question, "What is two and two," then we have the observed datum 
that he knows the answer to the question. Contrast the test with the 
questionnaire item, *Do you know the answer to the question, "What is two 

and two"?* The datxim in the form of a yes or no is self-reported because 

i 

all ve have is the subject's word for it. While this is an over-simplified 
example, it does accentuate the difference between genuine tests and so- 
called psylchological **tests," such as the Minnesota Multiphasic Personality 
Inventory (MMPI) and the California Psychological Inventory (CPI), which 

ERLC 



are no tests at all because they produce self-reportea data rather 
than observed data. 

The third dimension is orientation, using Baumann*s classification scheme. 
Again, for simplification of exposition, only three categories are used: 
feeling state orientation, symptom orientation and performance orientation* 
To be exhaustive one would have to add all the possible combinations of the 
three categories, including feeling state and symptom orientation, feeling 
state and perforinance orientation, symptom and performance orientation, and 
feeling state, symptom and performance orientation. It is entirely possible 
that health indicators fitting all these categories will be constructed in 
the future, but the three categories should accommodate most of the indices 
extant . 

With the three dimensions as described , the classification model can be 
geometrically represented as a cube, with H symbolizing the measurement 
dimension, U the utility dimension and 0 the orientation dimension. In the 
measurement dimension, or SR signifies self-reporting, H2 or OB obser- 
vations, and or SR-OB both self-reporting and observation. In the utility 
dimension, or IN signifies individual, and U2 or NA community or nation. 
0^ or FE represents feeling state orientation in the orientation dimension, 
O2 or SY symptom orientation, and 0^ or FE performance orientajtion. This 
model is shown in Figure 1. 
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Figure 1 

Schematic Representation of 
Classification Model 
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To illustrate how this classification scheme is used» the cube OUM2^2» 
marked Y» represents the cell into which fall all indicators that are 
performance oriented » that are used with individuals » and that are based 
on observed data. One of such indicators would be Katz* Index of 
Activities of Daily Living or ADL.(5) The Apgar Index for the Newborn (6), 
would fit in the X cube next to the Y cube» because it is symptom oriented » 
It Is used with individuals and it is based on observed data. The total 
number of .cells or categories is» in this particular case» 3 X 2 X 3 * 18. 
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The 18 categories are: OUM,,,, OUM,,,,, OUM,,^, ODM , OUM , OUM , 

111 112 113 121* 122 123 

OUMjj^. °'^213' °"^221' °'^222' °'^223' °""31l' °^312' °™313' 

0Ul.j2j^, OUMj22» ^^323* ^^111 Include Indicators that are 

feeling state oriented, that apply to Individuals, and that are based on 
self --repor ted data. ^^223 ^^^^^ comprise Indices that are symptom 
oriented, that are applicable to a community or a nation, and that are 
based on both observed data and self -reported data. The other letter 
combinations can be similarly interpreted. 

HEALTH STATUS AS PROGRAM OUTCOME 
Assuming that a valid and reliable health status Indicator has been developed, 
the problem of linking changes in health status with a health delivery system 
in a community still remains. This is so because any new health program im- 
plemented in a community constitutes only one type of input into an open 
system on which a multitude of known and unknown factors also impinge. To 
Isolate the effect of the program from the effects of the confounding factors 
is an extremely difficult, if not impossible, task without experimental mani- 
pulations and /or controls. Since a new health program is usually unique in 
many aspects it is generally impossible to obtain a control comparable to the 
new program. Even if a comparable control were obtainable, it would not help 
much because consumers of services from the two programs could not be ran- 
domly assigned. Without randomization of consmers, it would be Impossible 
to control systematic differences that might exist between the two groups and 
these systematic differences would then be confounded with differences in 
program effect. 

The difficulties Just described are, of course, not unique with health 
services research. They are confronted by all social scientists with an 
interest in the evaluation of social action programs. These difficulties 



are discussed In considerable detail by Suchman«(7) Most researchers 
with some knowledge of experimental design and statistics are familiar 
with these problems, but in most cases they have to work In situations 
where they can do very little, If anything at all, about them. 

On the other hand, there are some naive researchers who are undaunted by 
the complexities of measuring change in a social setting. These are the 
people who are not familiar with the classic. Problems in Measuring Change, 
(8) and take the cavalier view that a simple before-after design suffices 
in the evaluation of a health program. Statistically, they subtract the 
pre-scores from the post-scores and test the gain scores for significance. 
They are not aware of the fact that gain scores lire notoriously low in 
reliability and that the use of gain scores, which Cox(9) terms an index 
of response, requires the assumption that the regression of the post-data 
on the pre-data is linear, with a regression slope that is unity. In 
most cases this assumption is not valid and considerable doubt is cast on 
the findings. 

There are other ways of analyzing pre-post data, such as using the dif- 
ference between the post-score and the regressed score, and using the pre- 
scores as the covarlate in the analysis of the post-scores. While these 
procedures are an improvement over the simple analysis of gain score, the 
former are no more valid than the latter in establishing causality between 
a health program and the outcome measures. Whatever statistical procedures 
are used, a significant difference only means that a change has occurred; 
It does not automatically imply that the change is due to the health program. 
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STOCHASTIC MODELS IN OUASI-DESIGNS 
Much more promising as a statistical procedure for measuring change over 
time is the application of stochastic processes to non-stationary time 
series data. Where a control group Is not feasible » collecting data from 
the same group at regular intervals has the effect of using the same 
group as its own control. This design can be schematically represented as 
follows : 

Figure 2 

Hypothetical Representation of A 
Non-Stationary Time Series 
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In this design the dependent variable is health status and the independent 
variable is the absence and presence of a new health program with the 
arrow symbolizing the division line* Collection of health status data is 
made at fixed intervals from the same sample eight times. Statistically » 
it is tempting to use the time periods, assigned some arbitrary values, 
such as 1» 2, 3 and 4, as the independent variable and regress health 
status on this variable, independently for the time periods before the 
introduction of the health program and for the time periods after the 
introduction of the health program, and then compare the two intercepts 
and the two slopes. This, however, would not be a legitimate procedure 
because of the problem of auto-ccrr elation of the time periods. 
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For this type of data, the model and statistical techniques of Box 
and Tlao(lO) appear the most appropriate* This is the integrated 
moving average model represented by the two equations: 



= ^ aud z = L +|r^ ^ ^ , +<5tt (1) 
1 t-l c 



for the n-^ observations before the introduction of a program and 

for the nj observations following the introduction of the program, where: 

is the value of the dependent variable at time t, 
L is a fixed but unknown location parameter, 

y is a parameter describing the degree of interdependence of the observed 
values of the dependent variable in the time series and takes the values 
0^ Y<2,^^ is a random normal varlate with mean 0 and variance and 
^ is the change in level of v:he time series. 

Inspection of (1) and (2) shows that for the pre series, it is composed 
essentially of random shocks, a proportion of w^iich are accounted for by 
the non-independence of the time periods and assimilated into the level 
of the series. For the post series, a new parameter, ^ , Is introduced to 
account for change in level, presumably due to program effect. In either 
case the effects are linearly cumulative to and inclusive of the last 
time period. 

While the logic of the model is extremely simple, the computations are 

not. This is so because the values of L and ^ must be estimated from the 

data, using as the model in matrix notation: T ■ X-^+ e^ (3) 

where X is an N x 2 matrix of weights, ^ a 2 x 1 vector with L and (J as 

the elements, and e an N x 1 vector of random elements with mean 0 and variance 
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Qf\ If the value of Y is known, the least squares estimate of S is 
found by solving the least squares normal equations. In the case the 
value of If is unknown, a Bayesian analysis using sample information 
about >f is performed to make inferen^ies about • Then the estimated 
value of |») , which Box and Tiao have shown to have a t_ distribution 
with N - 2 degrees of freedom, is tested for statistical significance. 

It should be noted that while these statistical procedures enable the 
experimenter to tell whether or not a real change in the level of the 
post series has occurred, they do not ipso facto establish causality 
between program and effect. It could very well be that simultaneously 
with the introduction of the program, one or more other events took 
place that had greater impact on the health status of the community 
than did the new program. One of these events could be the discovery 
of a new therapeutic procedure, a new "wonder drug," or a new diet for 
obese people. In the absence of a control series In parallel with the 
post series, which would have helped to rule out the effects of the ex- 
traneous events if they existed^ one could accept causality with some 
confidence only if it were assumed that over the duration of the series 
no major events other than the new health program occurred to affect 
the pattern of the series. The validity of this assumption could 
checked by experts familiar with the health sciences and with the com- 
munity where the health program was introduced. 



- 12 - 



References 

!• World Health Organization, Constitution of World Health Organiza- 
tion, Annex I, In The First Ten Years of the World Health Organi- 
zation , Geneva: World Health Organization, 1958, 

2. World Health Organization, Measurement of Levels of Health , WHO 
Technical Report Series No, 137, Geneva: World Health Organiza- 
tion, 1957, 

3« Chen, K* , The measurement of health— Issues, problems and 
approaches. Unpublished manuscript, 

A, Bauman, B. , Diversities In conceptions of health and physical fit- 
ness. J ournal of Health and Human Behavior , 1961, 2, 39-46, 

5, Katz, S, et al. Studies of illness in the aged: the Index of ADL, a 
standardized measure of biological and psychosocial function. Journal 
of American Medical Association , 1963, 185, 914-919, 

6, Apgar, V,, A proposal for a new method for the evaluation of the new- 
born Infant, Anesthesiology and Anatomy , 1953, 32, 260, 

7, Suchman, E, A,, Evaluation Research, Principles and Practices in 
Public Services and Social Action Programs , New York: Russell 
Sage Foundatlpn, 1967, 

8, Harris, C, W, (ed,). Problems in Measuring Change , Madison, Wis,: 
The University of Wisconsin Press, 1963, 

9, Cox, D, R, , Planning of Experiments , New York: Wiley» 1958, 

10, Box, G, E, P, & Tlao, G, E, , A change In level of non-stationary 
time series, Blometrlka, 1965, 52, 181-192, 



ERIC 



